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^^Two nonparametric statistical methods, the' 
normal scores method and the rank order transformation, are 
for use in discriminant function analysis. The methods are 
f cr both normal and non-normal distributions* when the dist 
are normal, the rank and inverse normal scores methods are 
substitutes for the linear discriminant function (LDF) and 
quadratic discriminant function (QDF) • When t^e populations 
non-normalr the LDF methods based on the ranks or the inver 
scores are more effective than the LDF or QDF methods based 
raw data. Finally, when th'e criterion sample sizes^re uneq 
inverse tfoi^al scores approach is more desirable than the r 
approach* fften the criterion sample sizes are eoual, either 
twc procedures can. be used, (Author/JKS) * ' 
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Behavioral science decisions frequently involve the rational assign^^"^^^;^ 
ment' or classification of observations into one of a fihite number of pop- 
ulations based on an evaluation of -a series of measurements obtained on 
the observations. The set of statistical procedures that conventionally 
governs these decisions is known as discriminant analysis. 

The' theoretical basis for discriminant' analysis was introduced by 
WclchC1939) who adapted the hypothesis testing concepts of Neyman and Pearso^. 
Welch sho>'cd that a discrimination procedure that classified ,p-dimensional 
. observations Z into one of two populations II or IT was equivalent to a 
partitioning of the sample space n vhtb two mutually exclusive and exhaustive 
regions (i = 1,2), obtained by evaluating the likelihood ratio function 
at Z. Z is assigned to 11^ when th*^ value of the likelihood ratio is greater 
than sorr.c appropriately determined cons-tant k, and to 11 when the value 
of the likelihood ratio is. less than ic. ^ . 

It is possible that the classification decision for 'an dbservation could * 
be in error; Z could originate from any population whose density is non-zero ^ 
at Z. In the two population model ^.Mvhich will be the focus of this paper, two 
errors of classification are possible: . . 



-4-— -The--procedOTe-CBnr-7issipn Z\to~n"^^ to 11 . 

2. The procedure can assign Z to when Z actually belongs to 11 . 



^ When the likelihood ratio equals k, the usual procedure is to randomly assign 
the observations to one^of the pooulations.- 



■'■ ■■ ■ ■ -.^ ■ ■ • • • 

Associated with each error is-^a probability of committing it (called, the 
probability bf misclassification) , denoted by PC 2 e 11. | II.),. (i^ j = i,2) 

.WelchCl959) showed that for the two poptilation situation, with obser- 
vations drawn from known distributions, the optimal solution to the' c las si fi- 
-cation problem is ; ' 

• .. ; ' ' f^(z> / f^.(2) ' . . - CD 

wheire, f^(2) is the density ^ function of the ith distribution evaluated at 2. 
2 is classified, into 11 * if (1) is greater than constant fc, and into 11 , if 
CIJ is less than ic. Equation ('1) is optimal in the sense that it minimizes 
pC'Z z it. ! n. ). ic is defined as * . ' ' 

. ■ ' ' ' ■ - " 

JC = C q /-C q . . n) 

12 2 21 1- ^ ^ 

where C,^ ( i" ?^ j = 1,2) is the cost of mi^classifying an observation froip. ' 
Jl. into and q. ( i = l,.2) is the a priori probability that the observation 
belongs to ponulation- (Anderson, 1951) . ' 



Procedures for Multivariate Normal Distributions '\ 

Frequently data are col iected- that are representative of multivariate 
normal, distributions-. When the populations are so^distributed with known 
mean vectors and identical covariance matrices, (1) simplifies to 

2'2"'(y, - y ) - JiCy + y )'Z-Hy - y ) " ' (3) 

.rr 1 -^.2 -y^i ^-Z- — ^. ---i- -2 • — -—^ 



where Z is the common covariance matrix and y -(i*=l>,2) is the mean"vector 

, "-i ■ ' ■ ■ . ■ : 

•of n^. When (3) is greater than log (-<) , 2 is classified as belonging to II ; 

^ " 1 ' 



. when (3) is less than log (.<) , Z is class^rfied as belonging to H y • 
- Equation (3) is refer-red to as/the Unear Discriminant Function (LDF) . 

If the daTa originate- -from multivariate normal distributions with 
' known parameters., but the covariance matrices are not identical for' the two 

. populations, ■ the form of the likelihood ratio is quadratic. ' 

%Z'CS - Z -i)Z + (y 'Z T y '-Z--h'Z +■ ' - 

• ~ - ^ ''^ffH2'52"V Hi'^r'ni^ /tej]^^^ f^^: . - .. 

where, y. and^^ (i .= 1,2) are. the mean vector and covariance matrix, respectively, 
from n^ . Equation (4) is called the. Quadratic Discriminant FunctionCQDH • , 

• and its classification decision is identical to that of "the LDF. Marks' § 
Dunn (19.74) have shown that under the assumption of multivariate normality • 
and unequal covariance matrices, the QDF misclassxfies fewer observations, than 
the LDF, • . - ^'"T- 

Equation (1) is "optimal only when the densities are known and^'completely ' 
specified. It is inf:tequent, however, that researchers encounter situations-., 
where the .distributions from which their data are drawn are completely 

" • - , . \ i ' ' ' ' ' . ■ - ■ " 

specified^ Usu^ly, the densities ^re either completely unknown "or are known 
except 'for one or more^rameters\ For .these situations, the unknown para- 
meters" must be estimated from samples and procedures based on the sample estimates 
developed t,o classify new observations. , • . / 

^ P-etersonC1 949) and Fix § Ho dge s (lO'S l) determined that the best 

sample based procedure is of the likelihocjd^ratio type where the sample 

' V • ■ • " 

estimates replace^ the unknown pa^rameters (called "plug-in" procedures). 
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The classification constant is log (<) because of algebraic simplification. 
When C = C and q = q ,-log (<) = 0. 

12 211 2. 



.^Anderson (,195:1) developed. a statistic th^t is the sample based analogue to • 
the LDF for data -sampled -from nniltivariate normal distributribns with a common 

• but unknown covariance matrix and unknovjn mean vectors. Anderson's statistic 
■ is in the form of (3) with the maximum likelihood estimates (X". and S) sub- 

• stituted for the mean vector and. identical covariance matrix, respectively. 

a "... • - ' 

;Us>ng a similar arguinent, a sample based .Quadratic Discriminant Function- is 

' analogous- -to (4), with the maximum likelihood estimates fX and S Y 

- . ' ' . ■ ~i ~i ' 

replying, the unknown mean vectors and unequal covariance matrices. 

Procoduxes for Unknown or Non-Normal Distributions - ' 

The LDF and QDF are optimal only when the data are multivariate normal.-. 
In. the past, however, researchers have" relied upon the sample based LDF or 
QDF to resolve the- classi fi cation problem', regardless of the underlying dist- 
ributions of the data. LachenbrudTT Sneerin^er 5 RevoCi973)., Johnson & 
Rahiberg (1978) and Koffler § Penfield(1979) have investigated the robustness ^ 
of tiic LDF and QDF when classifying observations from non-normal distributions 
The three studies showed .that when data were drawn from such distributions, 
\the proportion of observations misclassified using the LDF or QDF was 
''substan^ial.ly altered from what was expected. Thus, the LDF and- QDF are not 
robust to the normaiity assumption and researchers could be misled by using 

•either procedure when 'investigating non-normal distributions. 

(' • 

Several ^nonparame trie procedures have been suggested as possible alter- 

■/ / . . ■ ' ^ 

n.aiiva5„to_Jthe-LDF-ar--(^F-^oT--^^ § PenfieldC1979) 

have empirically compared sev^al nonparametric procedures. That study showed 

that procedures such as, the Nearest Neighbor with Probability Blocks 

(Anderson, 1966; Fix § Hodges, 1951; .Gessaman C- Gessaman, 1972) and'the 



Lbftsgaarden-Quesenberry density estimator C.Lk)£tsgaarden 5^ Quesenberry,. 3 972) 

• classified .observations equally as 'effectively as either the' LDF or ODF^ 

when data' were sampled from multivariate norma! distributions and better than 
. ^ "^'^ ' " • - 

either procedure when data were sampled from non-normaf ''distributions, Mow- 

, _ ■ .• . / ' ' ^ ■• ■' 

ever, tnesc nonparamctric procedures l>ave limited utility because thev 

^generally require larger samples from each population- since 4:he entire- . 

density function must be estimated rather than simply unknown parameters.- 



Conover f 19^78) have suggested another solution- to the classifi- 

' .cati6n problem^for non-normal data based on first transforming -the data to 
.makl the distribution functions approxim.ately nbrmal and then .applying the . 
sample based LDF or ODF' to the transformed da.ta. This procedure is considcr- 
ably simpler to. use than the other n'onnaramotri c -alternatives* and requires 
only one \step mpre than the LDF or QDF^ liaincly,; the ranking of the data and 
observations ^to be classified. ' ' . - 

'Cono^ver \\ Iman empirically const ras^ted their suggested .classification ■ 
; procedures with ofher procedures ^^clud^ng the nonparamctric ones of Koffler 5 
Penfield and the LDF and^QpF using the original data). They concluded that 
if the data^ere normally distributed, the rank .methods performed equal ly 
^as^veill asySic.LDF and QDF; if the data , were non-normal , .the rank transfor- 
mati oh method worked better. 'than the. LDF' or "QDF and as well as any of the 
nonpar'ametric alternatives. In all^ instances^' the measure of performance 
•co^lsidered -jvas the overall proportion of tni<5c] assif ied observations. 

It should not be surprising that the rank .transformation was anappro- 
-pri^e one for the data. Many, nonparametri c* procedures, such as the - 
%^-Wb^tney (Wilco^con) test, the Kruskal-Wallis^ test and the Spearman rank. 

■ ■ ■ • ' ■• ' ; 



order correlation, are "based upon rank transformations and have b&en. shown- 

to be effective alternatives to their "pa rajnct^^l c counterparts. ' Furthcnnofe, 

. the rank transformation has been shown to work effectively in multivariate ■ 

^regression analysis (Iman f, Conover, 1^77)- and in the' analysis of expera- ' ^ 

racntal data (Iman, 1974; Conover C Iman, 1976). / 
4 - ■ ■ " .■ , ' ' 

Normal Scores Type Transformations' - ' 

A natural extension of the Conover 5 Iman il97&-) "study, involves the 
investigation of alternative- transforiQati on s that could -be used tV effectively 

. fl ■ ' - ■ ■ 

classify data from gtil types of distributi-o^is. " •• 

' -, ■ ' 

The normal scores,^ transforraation is one ,th:ft should he. considered. This 
j type of. trans fprm.nti^i derives its values from various properties of the normal 
di St rj^hut ion. Two fonns of the transformation are usually considered: 
the ex^pected normal order statistic (Iloeffding, 1951; Terry, 1952) and the 
inverse normal score (Van der. Waerdcn, 1 952, 1955, 1956), 

'^'^ilt^ based^oT^ normal scores transformations have not been used as ^ 
f> ^entlj^ s those based upon ranks. However, the results from those instances 
where such transformations have been applied suggest that they have utility 
in a number of . situations, specifically for discriminant analysis. 

The cffi-^iency of one test'CT^l relative to another (T^l can be deter- 
mined by comparing the ratio of n /n .where n. is the sample size of T 
(i = 1,2], under the-conditj on that both tests are used to test a specific 
hypothesis, have i dentiUj__a- and S levels and, therefore, are comparable 
with respect to level of significance and power (Conover, 1971"). 

Tests based upon the normal scores transformation have been extensively 
used for the k-sample location, problem (k > 2). For the two sample problem 

-i . * / ■ 

■ ' 8 . 



. the , Maim- Whi-tney (WMcoxon) statistic has an efficiency relative to the 

• t-test of 9$..S?D for^jibrmal^distributions, .100% for uniform distributions,- 
• ..-■.'""'• *, 

ajid^may. be infinite* for other distributions. 

* . ■ ' ■ 

A-similar test which utilizes a normal scores transformation has an 
a$ymptotic relative efficiency to the t-tesf of 1 00^." when the t-test 
- as'^sumptions are satisfied,' an<l greater than 100% when the t-test a^ssiuTrptions 
are violated. The normal, scores test' is more efficient than the Mann-Whitney 
(Wilcoxon) test when the distributions break off abruptly (e.g. uniform or 
exponential),, the rank test is more efficient for distributions with heavy 
tails (e.g. logistic or Cauchy), and there is essentially no difference 
between the two tests when the distributions are approximately normal 
(Lehmann, 1975). 

V/hcn k > 3, the Kruskal-Wallis test is generally used when the assumptions 
of the one-way analysis of variance F test are not satisfied. Hajek 5 ■ 
S.idaktl067) derived test statistics based upon expected normal order statistics 
and inverse normal scores. Puri(1964) showed that the asymptotic relative 
efficiency of Hajek 5 Sidak's normal scores test relative to the Kruskal- 
Wallis test or to the Fv^test is the same as that of the two sample normal 
scores test relative to the Mann-Whitney (V/ilcoxon) test or t-tcst. Further- 
more, Pratt (1964) has shewn that these normal scores tests are far less sen- 

^ ^"^^^^ "^Q non-homogeneity o f v ariance than is- t he F t est" or ^ the Kruskal- 

Wallis test. 



Because of the efficacy of the normal scores transformation for the 



location problem and its superiority to the rank transformation^in certain. ^ 
situations, it is of value to determine whether procedures basec! on these - ' 



transformations can be used to resolve the classification problem for data * 
sampled from non^normal. distributions' A natural extension to the Conqver • 5 - 
ImanC1978) study is„an investigation of the effectiveness of classifying 
* observat-ions with the LDF^and QDF based. upon a normal scores transformation.' 
Tire purpose of the research described in this paper is to en^irically * 
contrast classi fieltion procedures based, on normal scores with those based 
upon ranks and upon the original data when the data originate fron both 
normal and non-normal distributions. ^ 

Methodology * * 

To estimate the LDI- ^and QT)P parameters, criterion samples of varying 

sizes were generated for four ty]:)es of two dimensional distributions. T!ie 

four distributions consi dcrcd were the bivnriatc normal distribution and non- 
'Pormal representatives from thrcr^ classes of distributions: 1) finite range 
-(logi.t nomvil); 2) somi-lnfinitc range (log normal) ; and 5) infinite range 

(inverse hyperbolic sine normal). In. all instances the two dimensions were 

independent . , 

The three non-normal distributions were generated from the Johnson (1949) 
system of distributions. To obtain the required non-normal samples, normally 
distributed random variables were generated and then the appropriate inverse 
transformation applied." The Johnson system of transformations^ is sunmiarized 
in Table !. In Table 1 the variable x is normally distributed, while the ^ 
variable y is distributed ^according to the appropriate non-normal^ distribution. 
An algorithm by Ramberg 5 Schmeiser (1972), based upon the inverse functioir 
of the* lambda distribution, was used^to generate the normal deviates. Random 
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dcviates from a uniform distribution were needed to obtain~-the norma] • 
deviates. A multiplicative congruential procedure developed by Kossack 5 
Hensc-?lke tI975) was used for this purpose. 



TABLl: 1 



TRANSFORMATIONS (AND TIIHIR INVERSES) 
THAT GENERATE TIIE JOHNSON .(1949) 
SYSTEM OF DISTRIBUTIONS 



nTSTRIRUTICN 



TRANSFORMATION 



/INVERSE 



Log Normal 

Logit Norma 1' * , 

Invcrs'c; Hyperbolic 
„ Sine Normal 



y = lo.Q X 



0<x<«> 



y - log(x/]-x) 0<x<l 



Sinlv"^ (x) -oo<oc<oo ' 



X = EXP (y) . 

X = EXP (3^3/ 1 + EXP(y) 

X = Sinh(y) - 



.The bivnrintc normal dif^triftutions tliat'^wcre used' to gentjrate the *- 

««» ' ' ■ " . , - . ■ * 

non-^nornal samples^for U ^ and^'H^' ..each had the identity -matri^c'. for its 
covariance matrix. The men n vector for IT was (p,0) and for H it was \ 

.(0,0). For each of the four distributions, samples were generated for each 
combination of sample size f t^^.n^) = (8,8), (8,27),\ (8,64), (8,200), 
(27,27). (27,6,4),. G27,2(iji), .(64,04), ^(64 , 200) , ' (200, 200) 1 and first component 

of tfie mean vector for;'n^( p =, 1 , 2) . ^.In total"'there were samples Mrawn from' 

twenty combinations of fn ,n ).an3Vi3^for each distribution. ' 

The sample based LDF and XJDF wc-re^.^iised to establish the classifi catioA 

rules, assuming equal, costs of misclasSri^cTptidh and equal a priori ' • 



These values of n^,n^, and y were selecFe(f^?Sy^Tal l.eir^ervdous studies, 
including those o-f Conover 5 Inia"n • fl978) ^i^d.' lfi>fflerk5^'^/ield (1979). . 
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probalUlities of ^roup membership (i.e. lag (K)Vo). As previously 
outlined, both the LDF. and QDF involve the estimatioii.pf the population means' 
and thc.covariancc matrix (either pooled for th-e LDF or^ieparate for the 
■ QDF): from the criterion samples. These estimared values are then substituted , 
into (3) and. (4). The parameter estimates for the LDF ind QPF were- 
obtained in three ways, using the raw data, the ranks of the data and the 
cbrresponding' inverse normal scores.'' f 

Once the LDF and QDF parameters were estimated for each combination of- . 
sample size and first component of the; mean vector for^n , index 
samples consisting of "lObo new observations' from each original' population 
were-- generated. For each data point; . the rank and the inverse normal score ' 
were. computed. . Each Value, of the index 'samples was entered into (5) and 
(4) am!' the classification of the value, determined.' The proportion of mis- ^ 
• classified observations for each sample and' over a-11 samples was obtained. 

in all there were six classification methbds sj:udied (the LDF and QDF based 
. on the raw data, ranks, and inverse normal scores) . 

. Tl^c process was repeated .20 times. .Thus, the population -parameters 
were estimated, 20 different times and each time. 2000 observations were 
- classified. The estimated probability of misclassification for each sample 
-..was !)ased on 2.0,000 ob.s.ervat ions and the ovbrall estimated probabilities of 
miscla.ssification were based on 40.000 classifications for each combination 
of n ,n , and 

.12 . - ■ ■ . . ■ • 



".The inverse normal scores transformation diffefs^little from the - 

expected normal order statistic transformation. The two tran? format ions 
are asymptotically equivalent and structurally identical (McSWeeney 5 
Penfield, 1968). The inverse normal scores transformation w^s used because 
of Its ease of computation. ■ ; - 



5 



All computer programs to generate the data an'd classification procedures 
were written i/i the FORTRAN IV pi^ograiiiming language. " " " 
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To obtain the ranks of the'data*, the two criterion samples of size 
n^ and n^' were' combined. All observations in each of the- two dimensions 
were then replaced by thciV correspondinjj rank; rank 1 for the smallest 
■ observation to rank N (N = n + n ) f or the largest observation in eacR 

\ 1 ■* 2 ' ^ 

dimension. Each dimension was ranked separately and. ranks of tied obser-. 
v^ations were assigned randomly; ' ■ , . " * 

^ To obtain the. rank for each of the 1000 observations in rthe index 
samples, each new. observation was compared dimension by dimension with 
all N original observations. For each dimension of the new observation, 
.the original score was replaced i:)y a numtef obtained by linear interpolat^ion 
between two adjacent ranks from the original criterion samples. These 
interpolated ranks represented the plncement.of that dimension among the 
corresponding values of the same dimension in the N criterion sample 
observations. (Conover Iman, 1978). ' ' 

The derivation of the Van der.Waerdcn inverse normal, scores transfor- 
mation is based. upon the rnnks of the Sata. For this transformation, assume ^ 
the rank of the ith largest observation in a particular dimension is 
dc^noted by and $(X) represents the cumulative distribution function of 
a standard normal random variable. The Van der Waerden transformation is 
derived first by dividing.^Qfach of the ranks R by the quantity (K + 1). 
This creates a distribution of scores inthe interval (0, 1) . Then, by con- 
sidering .R./fN + 1) as a percentile of a normal distribution 
(i.e. *fX^) = R./CN +.1)), the X. values can be determinecf by performing 
the" inverse operation. Th^^t is, if $(X^) = R./iN + 1), then 
X = 4>"^CR /(N +1)). The-^ s form the Van- der Waerden inverse normal scores 
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Optimal Probability of Hisclassification 

For each of the distributions, it is possible to determine the 
optimal probability of misclassification (i.e. P( 2 e 11 | 11 when the 
population parameters are completely^specified and the distributions 
known). ^ Anderson (iV^Sl) has shown that the optimal probability of mis- 
classification associated wi^h thc:J.Dl- is denoted by ^$(-A/2); whcre'A^ . /"^ 
is the Mahalanobis distance between the two populations. Since' the multi- 
variarr_ normal distributions iii the present study were independent and- 
the only^non-zcro^mean component 'is y, simplifies to U^. Hence, 
0r.-A/2) = $(7^/2). ' " . 

TIic corresponding values for the optimal probability of misclassifi- 
cation for the biv:irint<^ normal distributions under studv are ^>(-l/2T =^0.308^ 
ciKc *<J>C-2/2) = 0. 1587. Anderson (1951) has shown that the, LDF minimizes;the 
Slim of the individual probabilities of misclassi fi cation (i .e. P (1/2)+? (2/1) ) . 
In the case of multivariate normal distri butions, this occurs -when 

pn/2) = p(2/n. : ' ■ _ 

Since the non-normnl data were obtained from non-linear transformations 
of data drawn from bivariate .normal distributions, those .data can be trans- 
, formed" back to the bivariate normal distributions by perfbfming the inverse 
operation. The optimal classification procedure for the non-normal data 
. involves trans forminn the data to normality and then applying the LDF. 

Thu^', the optimal probability of misclassi fication for the non-normal data 

is identical to that of the original bivariate normal- distributions, • . 

- '■ ' -'v.- '. 

\ ■ ■ ■■ _ 

^ For simplicity, let P( 2 e H | H ) = P(l/2), P( 2 e H | H ) = P^2/n ^ ^ 

1 2 "^ 21 

and P =5 the overall error rate.- ' 

^ This is true when there are equal costs of :misclassi'fi cation, equal a 
priori probabilities of group membership and completely ^.spcci fied mult i - 
variate norma] distributions with ec^ual covariance matrices 



For each conbinat ion of ^^>^^> and ]J, the .proportion of misclassified 
observations, ?Cl/2), PC2/lh/6nd^P, were deterinined and served as. the per- 
forpancc criteria and ineans of comparison amon;; the six procedures.^ 
Given 'the optimal values for the probabilities of'misclassification, the 
effectiveness of the six sample based procedures can be determined by com- 
paring the proportions of misclassification to. the optimal rate. 'ITie 
empirical ly.deteriT^ined proportions of misclassifi cation are estimates of 
the optimal values, and the proccuure that provides the. best estimates is 
considered to be most, effective. 

Two- criteria for comparison were' considered.. The fi^st was the relative 
disparity between ^{1/^2) .and P(2/10 for each, of the procedures. The . ' • 

y ' 'v/-: .. •■■ ^ ^ * .■ , ■ ■ " ' . " - . 

smaller the* di spar i ty, the more effective the .procedure (assuminp that- the * 

■ ' . r^- - 

overall pronprtion o^. misclassified obse'rvation.?^ approached , the optimal 

probability). The second criterion was the overall error rate. Tnese value.s__ 

■were. compared with the optimal values and a close agreement indicated an 

effective* procedure. Tests of nrpportion and associated post .hac procedures 

C^fnrnscui lo, 1966) were used to analyze the data. It is importar.t to note 

that in many instances while the differences among the proportions were 

'small, they were- statist ical ly significant tp < .05). 

♦Results ' ' 

'The results for eacli of the four distributions are presented in 
Tables 2,4,5, cind 6. ^ An examination of the tables shows' that the 



PCi/j) is the empirically determined estimate of PCi/j). V is the overall 
error rate and is^equal to [Pci/2) + P(2/l)]/2 because equal numbers of 
observations were classified from each sample. These estimates represent 
the average proportion of misclassified observations for the 20 trials 
The following abbreviations are used tor the remainder o^f the paper: 
Lpr =^ LDK procedure based on the raw data; RLDF = LDF procedure based on 
the ranks; I LDF = LDP procedure l)ased on the inverse normal scores. A 
similar set of abbreviations are used for the OHF procedures. 



proportion of misclassified observatipns for the rank transformation or 
_ for the, inverse normal ' scores transformations was identical for all of the 
distributions. This is to be expected since the non-normal data were 
derived from monotonic transformations of the normal data. Because ^of 
the monotonicity of the transformations, the order of the data. remained 
unchanged regardless of the distribution. Tlierefore, the ranks and inverse 
normal scores of the original data 'were unchanged, the sample estimates . ' 
of tlie population parameters were likewise unchanged, and the classification 
decisions were idj^ntical. - 

Normal' Distribution . * • 

. . V • ^ . ■ ■ ' ■ ■ 

Talkie' 2 presents tlie results for the bivariate normal samples of data^ 
For these data, it was expected that the performance of the tDF and QD^' 
should bo almost identical because the covariancp matrices were estimated 

..rom populations hotV.« having the-i-dcT^t4~ty covarirance-matri-X7--"FT0^ '(4)' 

.it is evident that when the two covariancc matrices 'are identical the' 
Onr is equivalent to the LDF'. 

V/hen n^ = 8, the three procedures based on the U)V had approximately 
the same overall proportion of /misclassi fied observations and di screpency 
between P(l/2) and Vi2/1). In all casts, the estimated overall proportion 
of misclassification was significantly greater than the optimal value of 
0.30SS. This result was not unexpected since the sample mean and 
covariance estimates for were based upon eight observations and thus 
had a l^rge standard error. ' . . ' 

T!vc three procedures based on the QDF' mis'classified cons'iderabl v more ' 
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observations than thx^e hased on the LUF. The fact that the covafiance 
natrix for 11^ was based, on sro few observations also provides an explan- ^ ^ 
ation for the large 'differences between the LDI- and QDF type procedures - ^ 

"the poojed sample, covari an cc matrix was jirobably very different from the 
separrate covariahcc mhXriccs. ■ . ^ ^ 

^Vhon n = n > 8, all six procedures fnisclassa fied approximately the 
same proportion^ of obscn^ationsi and were approximately equivalent to the ' * 

- optlT?:aI value. . ln;all casir§, the difference between P(l/.1) and P(2/n 
was smallest tor -the LDx based on the orifjinal data^ however the dis- 
crepency for all of- the. procedures was similar. • _ . • ' ■ 

• • ■ ■ ■ ' " ^ ' ■ - 

fT^^.^i ) =-X27,-M'! or (27,200), the three I.^IF procedures min'inized 
• t:'}c propojf4on of however, the j^oporti on "o'f 

ovoT^l^-errori?' for tH6. three QBF procedures did not differ substantially 
. /froin'tho o^^cs for the LDP, especial ly^r the RQDF, When n = 27 and n = 64, 
- the difference. between. 1/2) and r(2/l)- was approximately, equal for all of 

the .proc^ures; when.n^ .= 27. aiu'r n = 200, the di screpenci es were smal lest 
■ for th|- IFJ)^- njid-^ IQOr. However, none of the procedures exhibited discrepencies 
> that were -extensive/ IVhcn^ ,n ) = (64,200) , al 1 oP the procedures were * ^ 
r, ' eqiialPy as effective in te-rns of the overall, error rate. For th<} relative 
-.v. discrepancy 'between P(l/^) and P(2/l.), the LBF and QDF, based on the original 
SdH.ta, niinimized tbe difference,, while^the RLDF and RQDF exhibited a relatively 
severe inflation/defrntfon phenomenon (i.e. P(l/2) was considerably smaller 
than the optimal value while PC2/1) was considerably larger). 



11 = 2 
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ITiere was less of a disparity among the six procedures when 2 
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than when.y = I'for the bivafriate normal data. TVhen n = 8, the three 
procedures based on the LDF were again most effective in minimizing the 
overall, error rate. However, for the. two largest values of n },the RLDR 
procetlure was not as accurate as the LDF or ILDF. Additionally, the dis- 
cy between P(l/2) and P(2/l) was considerably larger for. the RLDF 

than for the' CDF or ILDF. ' ' ■ 

- . - ' . ^ _ .si 

mxen n^ = n^ > 8, there was no appreciable difference among the 
overall error rates or the discrepencies between P(l/2) and PC2/1) for 
thc'Six procedures. 'Additionally, as the Sample size increased P 
approached 0.1587, the optimal error rate. For the situation when 

" » there was no discernible -difference among the overall erro^r 



rates -when = (27^64) .or C64,200)j Jiowe\^^^^^ ,y ) = (27^0), 



the W.np and RQDP classified larger numbers of observations incorrectly than 
the other four procedures. Additionally, for all three of these sample 
sizes, the RLDP and RQDF discrepencies were significantly larger than the 
discrcpc^4j:/'for the other four procedures. " ' 

Summary ' ' • . 



As expected, the. LDF based on the .original" data proved to be an ^effective 
classification* -procedure for the bivariate normal , data. Furth^^rraoro, 
as the sample sizes increased, the QDF effectively classified .the ^ata v 
because the. separate covariance estimates and pooled covariance estimates 
began to converge to the identity matrix. ' .• 

In all instances, the RJ.DF and ILDF proved to be as effe^iv© as the 
LDF. The only exception to this occurred for the RLDF when the sample 
sizes were most disparate. When the sample sizes were egual, there was no 
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discernible difference, in the classification, ability of the three 
LDF methods. . , ' * > - . ' ^ 

■ Non-Normal .Distributions , ' 

•Table 3 illTustfates the means an'd variances for each dimension of the 

f ' " /' ... . > 

• three non-normal distributions. Clearly, the variam:es for>n /are 

markedly di:ffetent from that for IT :^r each .oV tWe non-normal samples. " 

The difference between the'twb populations is only\R<thc first dimension 

of the mean vector, however, this affects the entire ' classification • . *. 

process throu^ the sample co variance matrix. It was therefore appropriate 

to consider classify cation according to the QDP procedure far' these data. 

. Jthst the prpcedures'tiased upofi^the. ranks.- and-the invcrs^ normal 

scores were, identical for all'of thc^ non-normal distributions and for the 

bivariate no'rmal .distributions because the, transformations were monotonia 

Recau^ of thnt,t1l>& three non-normal clistriMitions can be considered 

together with fespiect to the. .classification of the index data based on thp 

-ranfc.and inverse normal scores procedures. Th*y must, however, be considered 

separately^with respect 'to the LDI- and QDF based on the origi^ data. 

- An examination of the results reveals that the^ LDF and QDF cTassified ' 

bo^h the log normal and .«>inverse hypc^rbblic sine' norma I data similarly, 

while they classified the logit no'rmal;data differently from the other 

two,, bfft similarly to the bivariate norn^l data. . .Hence, the results fqr .the 

< . ■ *. ■ ■ • ■ 

classification of the log normal and in-\kerse hyperbolic sine normal data 

■■.)>■■ ■ y r^' ■ 

will be- discussed together and the, lo^it Aiormal data separately. • 
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TABLE 3 



MEANS J\ND VAklANCBS OF TOE NON- NORMAL DISTRIBUTIONS 
FOR SPECIFIED MEANS Or THE NOR^!AL DISTRIBUTION 
. (0^= 1)^ 
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SOURCE: Lachenbrucdi, Sneeringer and Revo. Robustness of the Linear and 
Quadratic^Discriminant Function to Certain Types of Non^No^ali ty.. 
Coimnunica^.iorvs In Statistics ,, 1973, V, 54. 



3 2-1^ 

IS the variance of the underlying normal distribution, 
is the mean of the' underlying normal distribution. - 
T) - is the mean of the transforme^d non-normal variate. 
is the' variance of the transformed non-normal variatc. 



22 



■ . -20- ' . ; 

Log Normal Inverse Hyperbolic Sine .Normal Distributions . \ 

.The results for these data appear in Tables 4 and S. For all combin;::.- 

ations of sample size and y, the LDF and" QDF based on the original data^ 

significantly misclassifi'ed more observations than the procedures based upon 

the rank or inverse normal scores transformations . The LDF and-^DF based on 

the original data further exhibited a severe inflation/deflation effect. 

The LDF and QDF were , clearly inappropri^e for these types of Jion-norraal 

^distributions. Thus, the remaining discussion will consider only the four • 

■• ■ 

nonparametric procedures. 

, ' '. . ' • . 



For n^ = 8, the RLDF and ILDF procedures minimized, the overall , 
"crror^rate;" For^^al^ ot^r combinations of Sampl^ize, all of the four 
nonparametric procbdures were eqb<ij^lv^as effective, with the exception of 
the IQDF ivhen (n^,n^j = (27,64) or (277200). The inflation/ deflation, 
effect relnted to the discrepency between* P (1/2) and P(2/l) was smallest 
for the ILDF; however, in most instances, there 'was little difference 
among 'the four procedures. " - ' ■ • * , 



.The patte3^ for this value^erfywas. essentially identical to the 
pattern when y = 1. When n^ = i, the RLDF and ILDF'were most effective 
in minimizing the 'ov.erall errir rate. As the sample size increased, the 
four procedures became indistin^i^h^bl^n terms of P and P approached 
the optimal rate of 0.1587. However, upon examination of P(l/2) and 
PC2/1), it became apparent that the inflation/deflation effect was substan- 
tial for both the RLDF and RODF -in many instances. The discrepency for 
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RESULTS FOH the LOa NOrtMAL DisA'iRunn.'i 
TjTaL PiiCPOKTlON OF P| SCtASjIF IfcT OSSEP Vi>T lOMS 

^^ilill^Jll^^^ ^ PU/2I ?{^/n'' p * p(l/?) Pi?/i) P pii/2) P(2/n p , n\/2\ Ptz/u ? 
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RESULTS FOR THE INVERSE HYPFR80HC ^ SINE NORMAL OISTRIBUTION 
TOTAL PRJPaRTlON Oh MISCLA'SSITIEJ OBSERVATIONS 
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the ILDF and IQDF were approximately the same and smaller than that for 
the rank type .procedures. 

Sttmmary 

For these types of non-normal distributions,' the LDF and QDF basc?d 
upon the original data were clearly inappropriate as a classification 
procedure. The proportion of misclassified observations was substantially 
larger than the optimal rate, and also substantially larger than the overall 
error rate of the nonparamctric procedures. The- discrepency between P(l/2) 
and P (2/1) was substantial. 

hTien u = 1, the procedures based on the ranks 'and on the normal scores 

were approximately equal. As \i increased C^.e. y = 2) and the. distance 

it 

between the two distributions increased, the. procedures based on the 
inverse normaT scores transformation classified the data more appropriately 
based on the criteria of PCl/2) = P(2/l). ' 

Logit ?Qoj-maJ DistriBntion ■ \ ' . . 

lable 6-prcsents the results for the logit normal distribution. ^ As 
outjincd previously, the results for the rank and normal scores- type 
procedures were identical for all of the distributions. Therefore, the only 
difference concerns whether the procedures based on the LDF and QDF for the 
original data were appropriate for the data. For the logit normal samples,- 
the LDF and QDF classified the data equally as well as the four nonpara- . 
metric procedures. In fact, the results for this distribution were almost 
identical to the results for the bivariate normal distribution. For that 
reason, a discussion of these results is omitted and the reader should 
consult the section outlining the bivariate normal results.. 

2S 



I 



TABU 5 

^ RESULTS FfjR ThF LOGIT NORMAL DliTKMllTIDN 
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Conclusions ^ <■ 

. With sainples drawn from bivariate normal -distributions with equal 
. covariance matrices, the proportion of observations misclassified using 
the LDF procedure based on the ranks or the noiTnal scored of the data was 
not considerably different from that for the LDF based on the original 
data. Furthermore, the RLDF and IQDF proportions of misclassifi cation 
were almost equivalent to the optimal values in the bivariate normal case. 
Hence, it is to be expected that they would also be approximately equal to 
the optimal value for the non-norjnal situations because thosei procedures 
are not affected by the transformation from normality. . 

For the non-normal distributions, the LDF and QDF Were clearly inappfop 
riatc. They typ6 of non-rioxmnlity, however, appeared to hpve some effect^ 
on the performance of those procedures. The LDF and QDF suffered least 
when the distribution was bounded above and below (i.e. for the finite 
range logit normal distribution). UTien the range was semi-infinite or 
infinite, there was substantial incfease in the overal^ error rate and the 
inflation/deflation was considerable. . ^ . 

A dis'cussion of sample size is appropriate. With the normally dis- 
tributed samples, little was gained by using sample sizes larger than 
27 for any of the procedures. This was also true for the procedures 
based on the ranks and inverse normal scores for the non-normal data. 
,This result contrasts with the nonparametric methods studied Jby 
Koffler 6 PenfieldC1979) which required a fairly large saiaple size and 
showed iinprovements as the sample size increased beyond 64. * 

When n^ = n^, i.e. .the sample sizes for estimating the density 
function parameters weVe. equal, th.e four nonparamdtric procedures classified 



the data equally as well. When the sample sizes were unequal, the prodedures 
based on the inverse normal scores tended to more effectively classify the 
'.data. For those situations the procedures .based upon the ranks exhibited 
an inflation/yefla4:ion effect. . * 

; In,^. summary, when the distributions- are normal, the rank and inverse 

* . ■'• • • . 

normal scores metods are effective substitutes for the LDF and ODF. ' ] . 
When the populations. are non-normal, the LDF methods based on the ranks 
or tlie/inverse normal scores are more 'effective than the LDF or ODF methods 
based on the raw da^a. Finally, whpn the criterion sample- sizes are unequal, 
the inverse normal scores approach . is more, desirable than the rank approach. 
When the criterion .sample, sizes ,are equal,> either of. the two procedures' can 
be used- ^ . 



. . . J27- --; ■ - ■ '. ' - ■ 

REFERENCES 

* 

Anderson,* T.W. Classification by multivariate analysis. Psychometrika, 1951 
16, 31-50. . ' 

Anderson, T.W. Some nonparametric multivariate procedures based on statisti- 
cally equivalent blocks. Multivariate Analysis. Proceedings of the 
International Symposium, Day ton,. Ohio, 1965. New York : Academic Press 
1966, 5-27. . " 

Conover, W.J. Prac ti ca 1 Nonparaihetri c S tat i st i cs . New York: John Wile^, 1971. 

Conover, W.J. 5 Iman, R.L. On some alternative procedures using ranks .^^r 
the analysis of experimental designs. Communications in Statis tics^ 
Theory S Methods . 1976, ^4, 1349- 1368. — 

Fix, E. 5 Hodges. J. L. Nonparametric discrimination: Consistency properties .- 
Lackland Air Force Base, Texas: U.S. School of Aviation Medicine, 1951. 

Gessaman, M.P. & Gessaman, P.H. A comparison- of some multivariate discrim- 
ination procedures. Journa l of the American Statistical Association 
1972, 67, 468-472. ""^ \ ' r ~ ' 

Hajek, J. § Sidak, .Z. Theory of Rank Tests . Prague Academic Press, 1967.: 

Hoeffding, W. 'Optimum"* nonparametric tests. Proceedings of the Second 

Berkeley Symposium on Mathematical Statistics S Probability . Berkeley 
§ Los Angeles: University of California Press, 1951. 

Hoel, P.G. 5 Peterson, R.P. A solution to the problem of optimum classifi- 
cation. Annals of Mathematical Statistics ; 1949, 20, 433-438. 

Iman, R.L. A power study of a rank transform for the two-way classification 

model when interaction may be pr^ent. The Canadian Journal , of Statistics- 
Section C: Applications , 1974,* 2, 22.7-239. " ? — 

Im'an, R.L. ,5 Conover, W.J. The use of the rank transform in regression. 
Technometrics , 1977. 

Johnson, N.L. Systems of frequency curves, generated by m'ethods of tran*;-. 
lation. Biometrika, 1949, 36, 149^176, 

Johnson, M.E. 5 Ramberg, J^.S. Transformations of the multivariate 'normal 
distribution^with applications to simulation. Technical report 
LA«UR-77--2295, Los Alamos Scientific Laboratory^ New Mexico, 1977. 

Koffler; S.L. f, Penfield, D.A. Nonparametric discrimination procedures for ' 
. Jian-normol distributions. Joumal^of Sta tistical Computation 5 Simulation 
19'79, 8, 281-299. - ^ ~ ^ \ \ 

Kossack, C.F. 5' Henschke, C.I. Introduction to Statistics and Computer 

Programming . San Francisco: Hoi den-Day, Inc. 1975. • " % 



ERIC 



33 



Lach.enbruch, P.A. , Sneeringer, C. § Revo, L.T. Robustness of the liftiar and 
quadratic discriminant function to certain types of non-normality. 
Communications' in Statistics , 1975, ^, 39-.56' 

Lehmann, E.L. Nonparametrics : Statistical Methods Based on Raftks . San 
Francisrco: Holden-Day,"Inc. 1975. 

:Loftsgaarden, D.O. § Quesenberry, C^P. A nonparametric estimate of a multi- 
variate density function. Annals of Mathematical Statistics, 196S 
36, 1049-1051. ~~ ^ '. — ] 

Marks, S. f? Dunn, OiJ. Discriminant functions when covariance matrices are 

unequal. Journal of the American Statistical Association . 1974, 69, 555-559. 

Marascuilo, L.A. Large sample multiple comparisons. Psychological Bulletin. 
1966, 6£, 280-290. ' , — ' ^ 

McSweeney. H. 5 Penfield, D.A. The normal scores test for the c-sample prob-^' 
Ic^- The British Journal of Mathemat ical and Statistical .Psycholo^nr. 
1969,- 22, ;177-192. ' " " ^ — 

Rratt, J,W. Robustness of some procedures for the two-sampie. location prob- ^ 
Journal of the American Statistical Association . 1964,' 59, 665-680. 

Puri, M.L. .Asymptotic efficiency of a class of c-sample tests. Annals o^ 
Mathematical Statistics , 1964, 35, 102-121. 

RanrbergirJ.S. § Schmeiser, B.W. An approximate method for generating symmetric 
random variables. Communiciations of the Association f6t Compute r 
Machinery. 1972. _15, 987-990. ' ~ ^ 

Terry, M.E. Some rank order tests Which are most powerful against specif ic 

parametric alternatives. Annals of Matl^ematical Statistics , 1952^23,346-366. 

Van der Waerden, B.L. Order tests for the two-sampie problem and their, 
power. Iri^agationes Mathematicae , 1952, 14, 453-458. 

Van der Waerden. B.L. Order tests for the two-sample prqblem. Indagationes 
Mathematicae , 1953, J^S, 303-316. . 

Van^der Waerden, B.L. The computation of the X-distribution. Proceedings of 
^ the Third Berkeley Symposium on Mathematical Statistics and Probability . 
' 'Berkeley 5 Los Angeles: University of California Press, 1956. . • , 

Welch, B.L. Notes on discriminant functions. Biometrika , 1939, 31, 218-220. 



34 



